범주 불균형 분류 문제를 위한 동적 비용 민감 학습 방법

신창욱; 오진영; 차정원; Chang-Uk Shin; Jinyoung Oh; Jeong-Won Cha

연구문헌

국내 논문지

홈 > 연구문헌 > 국내 논문지 > 한국정보과학회 논문지 > 정보과학회 컴퓨팅의 실제 논문지 (KIISE Transactions on Computing Practices)

정보과학회 컴퓨팅의 실제 논문지 (KIISE Transactions on Computing Practices)

Current Result Document :

한글제목(Korean Title)	범주 불균형 분류 문제를 위한 동적 비용 민감 학습 방법
영문제목(English Title)	Dynamic Cost Sensitive Learning for Imbalanced Text Classification
저자(Author)	신창욱 오진영 차정원 Chang-Uk Shin Jinyoung Oh Jeong-Won Cha
원문수록처(Citation)	VOL 26 NO. 04 PP. 0211 ~ 0216 (2020. 04)
한글내용 (Korean Abstract)	학습 데이터셋 내 분류 범주 불균형은 그 데이터셋으로 학습된 분류 모형에 편향을 야기한다. 본 연구에서는 주어진 범주 불균형 데이터셋을 이용해 분류 모형을 학습하는 두 가지 새로운 비용 민감 학습 방법을 제안한다. 첫 번째 비용 민감 학습 방법은 학습 코퍼스 내 범주별 발생 빈도와 디리클레 분포를 이용한다. 동적 가중치 부여 방법이라 명명한 이 방법은 디리클레 분포에서 표본을 추출하여 모델 학습의 가중치로써 사용한다. 두 번째 방법은 학습 코퍼스 내 범주별 발생 빈도로 정답 표현을 변경하여 비용 민감 학습을 수행한다. 이 방법은 퍼지 정답 표현이라 명명하였다. 대화에서 발화의 감정과 화행을 분류하는 문제에 제안 방법을 적용하였을 때, MAP(Macro Average Precision) 기준 화행 약 1.1～2.2%p, 감정 약 0.9～3.6%p 가량의 성능 향상을 얻을 수 있었다. 실험 결과를 통해, 제안 방법이 범주 불균형 데이터셋의 학습에 효과적임을 확인하였다
영문내용 (English Abstract)	Classification category imbalance in training dataset causes bias in the classification model. In this paper, we propose two new cost-sensitive training methods for training classification models using a given category imbalanced dataset. The first proposed method uses the occurrence rate by category in the dataset and the Dirichlet distribution. This method, called the dynamic weighting method, takes a sample from the distribution and uses that as the weight of the loss function. The second proposed method performs training by changing the expression of the answer by the occurrence rate of each category in the training corpus. This method is called fuzzy answer representation. When applying the proposed method to classify emotions and speech acts in the dialogue, the performance improvement of approximately 1.1-2.2%p for speech act classification and 0.9-3.6%p for emotion based on MAP(Macro Average Precision) was obtained. The experimental results showed that the proposed method is effective for training the category imbalanced dataset.
키워드(Keyword)	텍스트 분류 범주 불균형 분류 비용 민감 학습 발화 화행 분류 발화 감정 분류 text classification category imbalanced classification cost-sensitive learning utterance speech-act classification utterance emotion classification
파일첨부	PDF 다운로드